NLP

Advanced Architectures and Memory Networks

Model overview and combinations, Dynamic memory networks. CS224n lecture 16.

Model overview and combinations

Model comparison :

  • Bag of Vectors: Surprisingly good baseline for simple text classification problems. Especially if followed by a few relu layers!
  • Window Model: Good for single word classification for problems that do not need wide context, e.g. POS
  • CNNs: good for classification, unclear how to incorporate phrase level annotation (can only take a single label), need zero padding for shorter phrases, hard to interpret, easy to parallelize on GPUs, can be very efficient and versatile
  • Recurrent Neural Networks: Cognitively plausible (reading from left to right, keeping a state), not best for classification (n-gram), slower than CNNs, can do sequence tagging and classification, very active research, amazing with attention mechanisms
  • TreeRNNs: Linguistically plausible, hard to parallelize, tree structures are discrete and harder to optimize, need a parser
  • Combinations and extensions!

Rarely do we use the vanilla models as is.

TreeLSTMs

  • LSTMs are great
  • TreeRNNs can benefit from gates too ->TreeRNNs + LSTMs
  • Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning

TreeLSTM

Quasi-Recurrent Neural Network

Quasi-Recurrent Neural Network

Neural Architecture Search(Google NAS)

  • Manual process of finding best units requires a lot of expertise
  • What if we could use AI to find the right architecture for any problem?
  • Neural architecture search with reinforcement learning by Zoph and Le, 2016

Neural Architecture Search

LSTM Cell vs NAS Cell

Dynamic Memory Network

Question Answering

Architecture of DMN

Dynamic Memory Network

左边输入input的每个句子每个单词的词向量,送入input module的GRU中。同样对于Question Module,也是一个GRU,两个GRU可以共享权值。

Question Module计算出一个Question Vector q,根据q应用attention机制,回顾input的不同时刻。根据attention强度的不同,忽略了一些input,而注意到另一些input。这些input进入Episodic Memory Module,注意到问题是关于足球位置的,那么所有与足球及位置的input被送入该模块。该模块每个隐藏状态输入Answer module,softmax得到答案序列。

Episodic Memory Module中有两条线,分别代表带着问题q第一次阅读input的记忆,以及带着问题q第二次阅读的记忆。

The Modules: Input

input

Further Improvement: BiGRU

BiGRU

The Modules: Question

question

$$
q_{t} = GRU(v_{t}, q_{t-1})
$$

The Modules: Episodic Memory

 Episodic Memory

Gates are activated if sentence relevant to the question or memory:

Episodic Memory

If summary is insufficient to answer the question, repeat sequence over input.

The Modules: Answer

Answer

  • $a_{t}$ : $h_{t}$
  • $y_{t-1}$ : 上一时刻的输出

Modularization Allows for Different Inputs